Introduction
In my vectorization using .NET APIs blog, I describe SIMD datatypes Vector64<T>
and Vector128<T>
that operates on ‘Arm64 hardware intrinsic’ APIs present under System.Runtime.Intrinsics.Arm.AdvSimd and System.Runtime.Intrinsics.Arm.AdvSimd.Arm64 class. In this post I will describe those hardware intrinsic APIs by showing sample code usage along with examples and generated Arm64 code. This will help people in understanding these APIs so they can use them to optimize their .NET code written to target Arm64. Since there are 360 APIs, describing all of them in a single post will be overwhelming. So I have divided these APIs among 8 blogs and will demonstrate 45 APIs in each blog. This is part 1 of that blog series.
Most of the description of these APIs is adapted and referenced from Arm Architecture Reference Manual Armv8, for Armv8-A architecture profile document. You can also refer to the description of SIMD and Floating-point instructions description at Arm developer docs page.
The blog page is programmatically generated and might contain mistakes. If you find any mistake, please leave a comment and I will address it.
APIs covered
1. Abs
Vector64<ushort> Abs(Vector64<short> value)
This method calculates the absolute value of each vector element value
, stores in a result vector and returns the result vector.
private Vector64<ushort> AbsTest(Vector64<short> value)
{
return AdvSimd.Abs(value);
}
// value = <-11, -12, -13, 14>
// Result = <11, 12, 13, 14>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<uint> Abs(Vector64<int> value)
Vector64<byte> Abs(Vector64<sbyte> value)
Vector64<float> Abs(Vector64<float> value)
Vector128<ushort> Abs(Vector128<short> value)
Vector128<uint> Abs(Vector128<int> value)
Vector128<byte> Abs(Vector128<sbyte> value)
Vector128<float> Abs(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Abs(Vector128<double> value)
Vector128<ulong> Abs(Vector128<long> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
abs v16.4h, v0.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
2. AbsoluteCompareGreaterThan
Vector64<float> AbsoluteCompareGreaterThan(Vector64<float> left, Vector64<float> right)
This method performs comparison of absolute value of corresponding vector elements in left
with those of right
vector and if the left
’s value is greater than the right
’s value, sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<float> AbsoluteCompareGreaterThanTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.AbsoluteCompareGreaterThan(left, right);
}
// left = <-11.5f, -12.5f>
// right = <10.5f, -22.5f>
// Result = <NaN, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> AbsoluteCompareGreaterThan(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteCompareGreaterThan(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareGreaterThanTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
facgt v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
3. AbsoluteCompareGreaterThanOrEqual
Vector64<float> AbsoluteCompareGreaterThanOrEqual(Vector64<float> left, Vector64<float> right)
This method performs comparison of absolute value of corresponding vector elements in left
with those of right
vector and if the left
’s value is greater than or equal to the right
’s value, sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<float> AbsoluteCompareGreaterThanOrEqualTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.AbsoluteCompareGreaterThanOrEqual(left, right);
}
// left = <-11.5f, -12.5f>
// right = <11.5f, -22.5f>
// Result = <NaN, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> AbsoluteCompareGreaterThanOrEqual(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteCompareGreaterThanOrEqual(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareGreaterThanOrEqualTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
facge v16.2s, v0.2s, v1.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
4. AbsoluteCompareGreaterThanOrEqualScalar
Vector64<double> AbsoluteCompareGreaterThanOrEqualScalar(Vector64<double> left, Vector64<double> right)
This method compares the absolute value of corresponding vector elements of left
and right
vector and if the left
’s element value is greater than or equal to the right
’s element value, sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> AbsoluteCompareGreaterThanOrEqualScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.AbsoluteCompareGreaterThanOrEqualScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteCompareGreaterThanOrEqualScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareGreaterThanOrEqualScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
facge d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
5. AbsoluteCompareGreaterThanScalar
Vector64<double> AbsoluteCompareGreaterThanScalar(Vector64<double> left, Vector64<double> right)
This method compares the absolute value of corresponding vector elements of left
and right
vector and if the left
’s element value is greater than the right
’s element value, sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> AbsoluteCompareGreaterThanScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.AbsoluteCompareGreaterThanScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteCompareGreaterThanScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareGreaterThanScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
facgt d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
6. AbsoluteCompareLessThan
Vector64<float> AbsoluteCompareLessThan(Vector64<float> left, Vector64<float> right)
This method performs comparison of absolute value of corresponding vector elements in left
and right
vector and if the left
’s value is less than to the right
’s value, sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<float> AbsoluteCompareLessThanTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.AbsoluteCompareLessThan(left, right);
}
// left = <-11.5f, -12.5f>
// right = <10.5f, -22.5f>
// Result = <0, NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> AbsoluteCompareLessThan(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteCompareLessThan(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareLessThanTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
facgt v16.2s, v1.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
7. AbsoluteCompareLessThanOrEqual
Vector64<float> AbsoluteCompareLessThanOrEqual(Vector64<float> left, Vector64<float> right)
This method performs comparison of absolute value of each vector element in left
with the absolute value of the corresponding vector element in right
and if the left
’s value is less than or equal to the right
’s value, sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<float> AbsoluteCompareLessThanOrEqualTest(Vector64<float> left, Vector64<float> right)
{
return AdvSimd.AbsoluteCompareLessThanOrEqual(left, right);
}
// left = <-11.5f, -12.5f>
// right = <11.5f, -22.5f>
// Result = <0, NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> AbsoluteCompareLessThanOrEqual(Vector128<float> left, Vector128<float> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteCompareLessThanOrEqual(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareLessThanOrEqualTest(System.Runtime.Intrinsics.Vector64`1[Single],System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
facge v16.2s, v1.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
8. AbsoluteCompareLessThanOrEqualScalar
Vector64<double> AbsoluteCompareLessThanOrEqualScalar(Vector64<double> left, Vector64<double> right)
This method compares the absolute value of corresponding vector elements of left
and right
vector and if the left
’s element value is less than or equal to the right
’s element value, sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> AbsoluteCompareLessThanOrEqualScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.AbsoluteCompareLessThanOrEqualScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteCompareLessThanOrEqualScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareLessThanOrEqualScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
facge d16, d1, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
9. AbsoluteCompareLessThanScalar
Vector64<double> AbsoluteCompareLessThanScalar(Vector64<double> left, Vector64<double> right)
This method compares the absolute value of corresponding vector elements of left
and right
vector and if the left
’s element value is less than the right
’s element value, sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> AbsoluteCompareLessThanScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.AbsoluteCompareLessThanScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteCompareLessThanScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteCompareLessThanScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
facgt d16, d1, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
10. AbsoluteDifference
Vector64<byte> AbsoluteDifference(Vector64<byte> left, Vector64<byte> right)
This method subtracts the corresponding vector elements of right
vector from those of left
vector, places the absolute values of the results in a result vector, and writes the vector to the result vector.
private Vector64<byte> AbsoluteDifferenceTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.AbsoluteDifference(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 37, 17>
// Result = <10, 10, 10, 10, 10, 10, 20, 1>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ushort> AbsoluteDifference(Vector64<short> left, Vector64<short> right)
Vector64<uint> AbsoluteDifference(Vector64<int> left, Vector64<int> right)
Vector64<byte> AbsoluteDifference(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> AbsoluteDifference(Vector64<float> left, Vector64<float> right)
Vector64<ushort> AbsoluteDifference(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> AbsoluteDifference(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> AbsoluteDifference(Vector128<byte> left, Vector128<byte> right)
Vector128<ushort> AbsoluteDifference(Vector128<short> left, Vector128<short> right)
Vector128<uint> AbsoluteDifference(Vector128<int> left, Vector128<int> right)
Vector128<byte> AbsoluteDifference(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> AbsoluteDifference(Vector128<float> left, Vector128<float> right)
Vector128<ushort> AbsoluteDifference(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AbsoluteDifference(Vector128<uint> left, Vector128<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> AbsoluteDifference(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uabd v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
11. AbsoluteDifferenceAdd
Vector64<byte> AbsoluteDifferenceAdd(Vector64<byte> addend, Vector64<byte> left, Vector64<byte> right)
This method subtracts the corresponding vector elements of the right
vector from those of left
vector, and accumulates the absolute values of the results along with the values of addend
and returns the accumulated result.
private Vector64<byte> AbsoluteDifferenceAddTest(Vector64<byte> addend, Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.AbsoluteDifferenceAdd(addend, left, right);
}
// addend = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <21, 52, 23, 24, 25, 26, 27, 28>
// right = <41, 32, 33, 34, 35, 36, 37, 38>
// Result = <31, 32, 23, 24, 25, 26, 27, 28>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AbsoluteDifferenceAdd(Vector64<short> addend, Vector64<short> left, Vector64<short> right)
Vector64<int> AbsoluteDifferenceAdd(Vector64<int> addend, Vector64<int> left, Vector64<int> right)
Vector64<sbyte> AbsoluteDifferenceAdd(Vector64<sbyte> addend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> AbsoluteDifferenceAdd(Vector64<ushort> addend, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> AbsoluteDifferenceAdd(Vector64<uint> addend, Vector64<uint> left, Vector64<uint> right)
Vector128<byte> AbsoluteDifferenceAdd(Vector128<byte> addend, Vector128<byte> left, Vector128<byte> right)
Vector128<short> AbsoluteDifferenceAdd(Vector128<short> addend, Vector128<short> left, Vector128<short> right)
Vector128<int> AbsoluteDifferenceAdd(Vector128<int> addend, Vector128<int> left, Vector128<int> right)
Vector128<sbyte> AbsoluteDifferenceAdd(Vector128<sbyte> addend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> AbsoluteDifferenceAdd(Vector128<ushort> addend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AbsoluteDifferenceAdd(Vector128<uint> addend, Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceAddTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uaba v0.8b, v1.8b, v2.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
12. AbsoluteDifferenceScalar
Vector64<double> AbsoluteDifferenceScalar(Vector64<double> left, Vector64<double> right)
This method subtracts the floating-point values in the elements of the right
vector from that of left
vector, stores the absolute value of into a result vector, and returns the result vector.
private Vector64<double> AbsoluteDifferenceScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.AbsoluteDifferenceScalar(left, right);
}
// left = <11.5>
// right = <16.5>
// Result = <5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<float> AbsoluteDifferenceScalar(Vector64<float> left, Vector64<float> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fabd d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
13. AbsoluteDifferenceWideningLower
Vector128<ushort> AbsoluteDifferenceWideningLower(Vector64<byte> left, Vector64<byte> right)
This method subtracts the corresponding vector elements in the right
from those of left
and places the absolute value in a result vector and returns the result vector. The result vector Vector128<ushort>
as seen in below example is twice the size of input parameter Vector4<byte>
.
private Vector128<ushort> AbsoluteDifferenceWideningLowerTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.AbsoluteDifferenceWideningLower(left, right);
}
// left = <11, 2, 113, 104, 180, 11, 120, 121>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <10, 20, 90, 80, 155, 15, 93, 93>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<uint> AbsoluteDifferenceWideningLower(Vector64<short> left, Vector64<short> right)
Vector128<ulong> AbsoluteDifferenceWideningLower(Vector64<int> left, Vector64<int> right)
Vector128<ushort> AbsoluteDifferenceWideningLower(Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> AbsoluteDifferenceWideningLower(Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> AbsoluteDifferenceWideningLower(Vector64<uint> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uabdl v16.8h, v0.8b, v1.8b
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
14. AbsoluteDifferenceWideningLowerAndAdd
Vector128<ushort> AbsoluteDifferenceWideningLowerAndAdd(Vector128<ushort> addend, Vector64<byte> left, Vector64<byte> right)
This method subtracts the corresponding vector elements of right
from that of left
, and accumulates the absolute values of the result along with the elements of addend
and return the accumulated vector. The result vector Vector128<ushort>
as seen in below example is twice the size of input parameter Vector4<byte>
.
private Vector128<ushort> AbsoluteDifferenceWideningLowerAndAddTest(Vector128<ushort> addend, Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.AbsoluteDifferenceWideningLowerAndAdd(addend, left, right);
}
// addend = <100, 200, 300, 100, 100, 100, 100, 100>
// left = <11, 2, 113, 104, 180, 11, 120, 121>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <110, 220, 390, 180, 1155, 115, 193, 193>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> AbsoluteDifferenceWideningLowerAndAdd(Vector128<int> addend, Vector64<short> left, Vector64<short> right)
Vector128<long> AbsoluteDifferenceWideningLowerAndAdd(Vector128<long> addend, Vector64<int> left, Vector64<int> right)
Vector128<short> AbsoluteDifferenceWideningLowerAndAdd(Vector128<short> addend, Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> AbsoluteDifferenceWideningLowerAndAdd(Vector128<uint> addend, Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> AbsoluteDifferenceWideningLowerAndAdd(Vector128<ulong> addend, Vector64<uint> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceWideningLowerAndAddTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uabal v0.8h, v1.8b, v2.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
15. AbsoluteDifferenceWideningUpper
Vector128<ushort> AbsoluteDifferenceWideningUpper(Vector128<byte> left, Vector128<byte> right)
This method subtracts the corresponding vector elements in upper half of right
vector from those of left
, places the absolute value of the result in a result vector and returns the result vector. The size of individual element of result Vector128<ushort>
as seen in below example is twice the size of input parmeter Vector128<byte>
.
private Vector128<ushort> AbsoluteDifferenceWideningUpperTest(Vector128<byte> left, Vector128<byte> right)
{
return AdvSimd.AbsoluteDifferenceWideningUpper(left, right);
}
// left = <11, 208, 103, 184, 180, 21, 130, 151, 31, 2, 113, 104, 180, 11, 120, 121>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 20, 122, 231, 24, 25, 26, 27, 28>
// Result = <11, 120, 118, 80, 155, 15, 93, 93>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<uint> AbsoluteDifferenceWideningUpper(Vector128<short> left, Vector128<short> right)
Vector128<ulong> AbsoluteDifferenceWideningUpper(Vector128<int> left, Vector128<int> right)
Vector128<ushort> AbsoluteDifferenceWideningUpper(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> AbsoluteDifferenceWideningUpper(Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> AbsoluteDifferenceWideningUpper(Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uabdl2 v16.8h, v0.16b, v1.16b
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
16. AbsoluteDifferenceWideningUpperAndAdd
Vector128<ushort> AbsoluteDifferenceWideningUpperAndAdd(Vector128<ushort> addend, Vector128<byte> left, Vector128<byte> right)
This method subtracts the corresponding vector elements in upper half of right
from those of left
, accumulates the absolute value of the result along with addened
and return the accumulated vector. The size of individual element of result Vector128<ushort>
as seen in below example is twice the size of input parmeter Vector128<byte>
.
private Vector128<ushort> AbsoluteDifferenceWideningUpperAndAddTest(Vector128<ushort> addend, Vector128<byte> left, Vector128<byte> right)
{
return AdvSimd.AbsoluteDifferenceWideningUpperAndAdd(addend, left, right);
}
// addend = <100, 200, 300, 100, 100, 100, 100, 100>
// left = <11, 208, 103, 184, 180, 21, 130, 151, 31, 2, 113, 104, 180, 11, 120, 121>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 20, 122, 231, 24, 25, 26, 27, 28>
// Result = <111, 320, 418, 180, 255, 115, 193, 193>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> AbsoluteDifferenceWideningUpperAndAdd(Vector128<int> addend, Vector128<short> left, Vector128<short> right)
Vector128<long> AbsoluteDifferenceWideningUpperAndAdd(Vector128<long> addend, Vector128<int> left, Vector128<int> right)
Vector128<short> AbsoluteDifferenceWideningUpperAndAdd(Vector128<short> addend, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<uint> AbsoluteDifferenceWideningUpperAndAdd(Vector128<uint> addend, Vector128<ushort> left, Vector128<ushort> right)
Vector128<ulong> AbsoluteDifferenceWideningUpperAndAdd(Vector128<ulong> addend, Vector128<uint> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsoluteDifferenceWideningUpperAndAddTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd16 -> d2 HFA(simd16)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uabal2 v0.8h, v1.16b, v2.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
17. AbsSaturate
Vector64<short> AbsSaturate(Vector64<short> value)
This method calculates saturated absolute value of each vector element of value
. If any element’s absolute value is outside the range, the result is saturated. In below example, 1st lane value is -32768
which is ushort.MinValue
. It’s absolute value would be 32768
, but since it is out of range, it is saturated to 32767
which is ushort.MaxValue
.
private Vector64<short> AbsSaturateTest(Vector64<short> value)
{
return AdvSimd.AbsSaturate(value);
}
// value = <-32768, -12, -13, 32767>
// Result = <32767, 12, 13, 32767>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> AbsSaturate(Vector64<int> value)
Vector64<sbyte> AbsSaturate(Vector64<sbyte> value)
Vector128<short> AbsSaturate(Vector128<short> value)
Vector128<int> AbsSaturate(Vector128<int> value)
Vector128<sbyte> AbsSaturate(Vector128<sbyte> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<long> AbsSaturate(Vector128<long> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsSaturateTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqabs v16.4h, v0.4h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
18. AbsSaturateScalar
Vector64<short> AbsSaturateScalar(Vector64<short> value)
This method reads 0th vector element from the value
vector, stores the absolute value of the result into a result vector and returns the result vector. This method operates on signed integer values.
private Vector64<short> AbsSaturateScalarTest(Vector64<short> value)
{
return AdvSimd.Arm64.AbsSaturateScalar(value);
}
// value = <11, 12, 13, 14>
// Result = <11, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> AbsSaturateScalar(Vector64<int> value)
Vector64<long> AbsSaturateScalar(Vector64<long> value)
Vector64<sbyte> AbsSaturateScalar(Vector64<sbyte> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int16]):System.Runtime.Intrinsics.Vector64`1[Int16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqabs h16, h0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
19. AbsScalar
Vector64<double> AbsScalar(Vector64<double> value)
This method calculates floating-point absolute value, similar to Abs()
and stores them in result vector and return the result vector.
private Vector64<double> AbsScalarTest(Vector64<double> value)
{
return AdvSimd.AbsScalar(value);
}
// value = <-11.5>
// Result = <11.5>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> AbsScalar(Vector64<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<ulong> AbsScalar(Vector64<long> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AbsScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fabs d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
20. Add
Vector64<byte> Add(Vector64<byte> left, Vector64<byte> right)
This method adds the corresponding vector elements in the left
and right
vector, and returns the result vector.
private Vector64<byte> AddTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.Add(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <32, 34, 36, 38, 40, 42, 44, 46>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> Add(Vector64<short> left, Vector64<short> right)
Vector64<int> Add(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> Add(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> Add(Vector64<float> left, Vector64<float> right)
Vector64<ushort> Add(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> Add(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> Add(Vector128<byte> left, Vector128<byte> right)
Vector128<short> Add(Vector128<short> left, Vector128<short> right)
Vector128<int> Add(Vector128<int> left, Vector128<int> right)
Vector128<long> Add(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> Add(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> Add(Vector128<float> left, Vector128<float> right)
Vector128<ushort> Add(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> Add(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> Add(Vector128<ulong> left, Vector128<ulong> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Add(Vector128<double> left, Vector128<double> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
add v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
21. AddAcross
Vector64<byte> AddAcross(Vector64<byte> value)
This method adds every vector element in the value
vector together, and writes the result to the 0th element of result vector, while other elements of result vector set to 0.
private Vector64<byte> AddAcrossTest(Vector64<byte> value)
{
return AdvSimd.Arm64.AddAcross(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <116, 0, 0, 0, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<short> AddAcross(Vector64<short> value)
Vector64<sbyte> AddAcross(Vector64<sbyte> value)
Vector64<ushort> AddAcross(Vector64<ushort> value)
Vector64<byte> AddAcross(Vector128<byte> value)
Vector64<short> AddAcross(Vector128<short> value)
Vector64<int> AddAcross(Vector128<int> value)
Vector64<sbyte> AddAcross(Vector128<sbyte> value)
Vector64<ushort> AddAcross(Vector128<ushort> value)
Vector64<uint> AddAcross(Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddAcrossTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
addv b16, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
22. AddAcrossWidening
Vector64<ushort> AddAcrossWidening(Vector64<byte> value)
This method adds every vector element in the value
vector together, and writes the result to the 0th element of result vector, while other elements of result vector set to 0. As seen in below example, the result vector’s element size ushort
is twice as long as the input parameter’s element size byte
.
private Vector64<ushort> AddAcrossWideningTest(Vector64<byte> value)
{
return AdvSimd.Arm64.AddAcrossWidening(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <116, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<int> AddAcrossWidening(Vector64<short> value)
Vector64<short> AddAcrossWidening(Vector64<sbyte> value)
Vector64<uint> AddAcrossWidening(Vector64<ushort> value)
Vector64<ushort> AddAcrossWidening(Vector128<byte> value)
Vector64<int> AddAcrossWidening(Vector128<short> value)
Vector64<long> AddAcrossWidening(Vector128<int> value)
Vector64<short> AddAcrossWidening(Vector128<sbyte> value)
Vector64<uint> AddAcrossWidening(Vector128<ushort> value)
Vector64<ulong> AddAcrossWidening(Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddAcrossWideningTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uaddlv h16, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
23. AddHighNarrowingLower
Vector64<byte> AddHighNarrowingLower(Vector128<ushort> left, Vector128<ushort> right)
This method adds corresponding vector elements in the left
and right
vector, places the most significant half of the result into the result vector and return the result vector. As seen in below example, elements in result vector Vector64<byte>
is half the size of that of input Vector128<ushort>
although number of total elements are same.
private Vector64<byte> AddHighNarrowingLowerTest(Vector128<ushort> left, Vector128<ushort> right)
{
return AdvSimd.AddHighNarrowingLower(left, right);
}
// left = <100, 200, 300, 400, 500, 600, 700, 800>
// right = <900, 1000, 1100, 1200, 1300, 1400, 1500, 1600>
// Result = <3, 4, 5, 6, 7, 7, 8, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AddHighNarrowingLower(Vector128<int> left, Vector128<int> right)
Vector64<int> AddHighNarrowingLower(Vector128<long> left, Vector128<long> right)
Vector64<sbyte> AddHighNarrowingLower(Vector128<short> left, Vector128<short> right)
Vector64<ushort> AddHighNarrowingLower(Vector128<uint> left, Vector128<uint> right)
Vector64<uint> AddHighNarrowingLower(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddHighNarrowingLowerTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
addhn v16.8b, v0.8h, v1.8h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
24. AddHighNarrowingUpper
Vector128<byte> AddHighNarrowingUpper(Vector64<byte> lower, Vector128<ushort> left, Vector128<ushort> right)
This method adds corresponding vector elements in the left
and right
vector, places the most significant half of the result into upper half of the result vector while the lower half of vector is set to the elements in lower
.
private Vector128<byte> AddHighNarrowingUpperTest(Vector64<byte> lower, Vector128<ushort> left, Vector128<ushort> right)
{
return AdvSimd.AddHighNarrowingUpper(lower, left, right);
}
// lower = <1, 255, 13, 41, 54, 61, 71, 18>
// left = <100, 200, 300, 400, 500, 600, 700, 800>
// right = <900, 1000, 1100, 1200, 1300, 1400, 1500, 1600>
// Result = <1, 255, 13, 41, 54, 61, 71, 18, 3, 4, 5, 6, 7, 7, 8, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> AddHighNarrowingUpper(Vector64<short> lower, Vector128<int> left, Vector128<int> right)
Vector128<int> AddHighNarrowingUpper(Vector64<int> lower, Vector128<long> left, Vector128<long> right)
Vector128<sbyte> AddHighNarrowingUpper(Vector64<sbyte> lower, Vector128<short> left, Vector128<short> right)
Vector128<ushort> AddHighNarrowingUpper(Vector64<ushort> lower, Vector128<uint> left, Vector128<uint> right)
Vector128<uint> AddHighNarrowingUpper(Vector64<uint> lower, Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddHighNarrowingUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd16 -> d2 HFA(simd16)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
addhn2 v0.16b, v1.8h, v2.8h
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
25. AddPairwise
Vector64<byte> AddPairwise(Vector64<byte> left, Vector64<byte> right)
This method creates a vector by concatenating the vector elements of left
vector followed by those of the right
vector, reads each pair of adjacent vector elements from the concatenated vector, adds each pair of values and places them in result vector and returns the result vector.
private Vector64<byte> AddPairwiseTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.AddPairwise(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <23, 27, 31, 35, 43, 47, 51, 55>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AddPairwise(Vector64<short> left, Vector64<short> right)
Vector64<int> AddPairwise(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> AddPairwise(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> AddPairwise(Vector64<float> left, Vector64<float> right)
Vector64<ushort> AddPairwise(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> AddPairwise(Vector64<uint> left, Vector64<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<byte> AddPairwise(Vector128<byte> left, Vector128<byte> right)
Vector128<double> AddPairwise(Vector128<double> left, Vector128<double> right)
Vector128<short> AddPairwise(Vector128<short> left, Vector128<short> right)
Vector128<int> AddPairwise(Vector128<int> left, Vector128<int> right)
Vector128<long> AddPairwise(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> AddPairwise(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> AddPairwise(Vector128<float> left, Vector128<float> right)
Vector128<ushort> AddPairwise(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AddPairwise(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> AddPairwise(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
addp v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
26. AddPairwiseScalar
Vector64<float> AddPairwiseScalar(Vector64<float> value)
This method adds vector elements in the value
vector and writes the result to the 0th element of result vector, while other elements of result vector set to 0.
private Vector64<float> AddPairwiseScalarTest(Vector64<float> value)
{
return AdvSimd.Arm64.AddPairwiseScalar(value);
}
// value = <11.5, 12.5>
// Result = <24, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<double> AddPairwiseScalar(Vector128<double> value)
Vector64<long> AddPairwiseScalar(Vector128<long> value)
Vector64<ulong> AddPairwiseScalar(Vector128<ulong> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseScalarTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
faddp s16, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
27. AddPairwiseWidening
Vector64<ushort> AddPairwiseWidening(Vector64<byte> value)
This method adds pairs of adjacent integer values from the value
vector, stores them in a result vector and returns the vector. As seen in example below, the result vector elements ushort
is twice as long as the input’s vector elements size byte
.
private Vector64<ushort> AddPairwiseWideningTest(Vector64<byte> value)
{
return AdvSimd.AddPairwiseWidening(value);
}
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <23, 27, 31, 35>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> AddPairwiseWidening(Vector64<short> value)
Vector64<short> AddPairwiseWidening(Vector64<sbyte> value)
Vector64<uint> AddPairwiseWidening(Vector64<ushort> value)
Vector128<ushort> AddPairwiseWidening(Vector128<byte> value)
Vector128<int> AddPairwiseWidening(Vector128<short> value)
Vector128<long> AddPairwiseWidening(Vector128<int> value)
Vector128<short> AddPairwiseWidening(Vector128<sbyte> value)
Vector128<uint> AddPairwiseWidening(Vector128<ushort> value)
Vector128<ulong> AddPairwiseWidening(Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseWideningTest(System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uaddlp v16.4h, v0.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
28. AddPairwiseWideningAndAdd
Vector64<ushort> AddPairwiseWideningAndAdd(Vector64<ushort> addend, Vector64<byte> value)
This method adds pairs of adjacent integer values from the value
vector and accumulates the results with those of addend
vector and return the result vector. As seen in below example, the result vector element size ushort
is twice as long as that of input parameter’s size byte
.
private Vector64<ushort> AddPairwiseWideningAndAddTest(Vector64<ushort> addend, Vector64<byte> value)
{
return AdvSimd.AddPairwiseWideningAndAdd(addend, value);
}
// addend = <11, 12, 13, 14>
// value = <11, 12, 13, 14, 15, 16, 17, 18>
// Result = <34, 39, 44, 49>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<int> AddPairwiseWideningAndAdd(Vector64<int> addend, Vector64<short> value)
Vector64<short> AddPairwiseWideningAndAdd(Vector64<short> addend, Vector64<sbyte> value)
Vector64<uint> AddPairwiseWideningAndAdd(Vector64<uint> addend, Vector64<ushort> value)
Vector128<ushort> AddPairwiseWideningAndAdd(Vector128<ushort> addend, Vector128<byte> value)
Vector128<int> AddPairwiseWideningAndAdd(Vector128<int> addend, Vector128<short> value)
Vector128<long> AddPairwiseWideningAndAdd(Vector128<long> addend, Vector128<int> value)
Vector128<short> AddPairwiseWideningAndAdd(Vector128<short> addend, Vector128<sbyte> value)
Vector128<uint> AddPairwiseWideningAndAdd(Vector128<uint> addend, Vector128<ushort> value)
Vector128<ulong> AddPairwiseWideningAndAdd(Vector128<ulong> addend, Vector128<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseWideningAndAddTest(System.Runtime.Intrinsics.Vector64`1[UInt16],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uadalp v0.4h, v1.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
29. AddPairwiseWideningAndAddScalar
Vector64<long> AddPairwiseWideningAndAddScalar(Vector64<long> addend, Vector64<int> value)
This method adds pairs of adjacent integer values from value
vector and accumulates the results with the vector elements of addend
and returns the result vector. As seen in below example, the result vector element’s size long
is twice as long as that of value
vector element’s size int
.
private Vector64<long> AddPairwiseWideningAndAddScalarTest(Vector64<long> addend, Vector64<int> value)
{
return AdvSimd.AddPairwiseWideningAndAddScalar(addend, value);
}
// addend = <11>
// value = <11, 12>
// Result = <34>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> AddPairwiseWideningAndAddScalar(Vector64<ulong> addend, Vector64<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseWideningAndAddScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sadalp v0.1d, v1.2s
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
30. AddPairwiseWideningScalar
Vector64<long> AddPairwiseWideningScalar(Vector64<int> value)
This method adds pairs of adjacent integer values from the value
vector, stores them in a result vector and returns the result vector. As seen in below example, the result vector element’s size long
is twice as long as the input value
’s element size int
.
private Vector64<long> AddPairwiseWideningScalarTest(Vector64<int> value)
{
return AdvSimd.AddPairwiseWideningScalar(value);
}
// value = <11, 12>
// Result = <23>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> AddPairwiseWideningScalar(Vector64<uint> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddPairwiseWideningScalarTest(System.Runtime.Intrinsics.Vector64`1[Int32]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
saddlp v16.1d, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
31. AddRoundedHighNarrowingLower
Vector64<byte> AddRoundedHighNarrowingLower(Vector128<ushort> left, Vector128<ushort> right)
This method adds corresponding vector elements in left
vector to those of right
vector, stores the most significant half of the result into the result vector such that the result is rounded and return the result.
private Vector64<byte> AddRoundedHighNarrowingLowerTest(Vector128<ushort> left, Vector128<ushort> right)
{
return AdvSimd.AddRoundedHighNarrowingLower(left, right);
}
// left = <100, 200, 300, 400, 500, 600, 700, 800>
// right = <900, 1000, 1100, 1200, 1300, 1400, 1500, 1600>
// Result = <4, 5, 5, 6, 7, 8, 9, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AddRoundedHighNarrowingLower(Vector128<int> left, Vector128<int> right)
Vector64<int> AddRoundedHighNarrowingLower(Vector128<long> left, Vector128<long> right)
Vector64<sbyte> AddRoundedHighNarrowingLower(Vector128<short> left, Vector128<short> right)
Vector64<ushort> AddRoundedHighNarrowingLower(Vector128<uint> left, Vector128<uint> right)
Vector64<uint> AddRoundedHighNarrowingLower(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddRoundedHighNarrowingLowerTest(System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
raddhn v16.8b, v0.8h, v1.8h
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
32. AddRoundedHighNarrowingUpper
Vector128<byte> AddRoundedHighNarrowingUpper(Vector64<byte> lower, Vector128<ushort> left, Vector128<ushort> right)
This method adds corresponding vector elements in left
vector to those of right
vector, places the most significant half of the result (after rounding) into the upper half of the result vector while the lower half is set to the elements in lower
.
private Vector128<byte> AddRoundedHighNarrowingUpperTest(Vector64<byte> lower, Vector128<ushort> left, Vector128<ushort> right)
{
return AdvSimd.AddRoundedHighNarrowingUpper(lower, left, right);
}
// lower = <1, 255, 13, 41, 54, 61, 71, 18>
// left = <100, 200, 300, 400, 500, 600, 700, 800>
// right = <900, 1000, 1100, 1200, 1300, 1400, 1500, 1600>
// Result = <1, 255, 13, 41, 54, 61, 71, 18, 4, 5, 5, 6, 7, 8, 9, 9>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<short> AddRoundedHighNarrowingUpper(Vector64<short> lower, Vector128<int> left, Vector128<int> right)
Vector128<int> AddRoundedHighNarrowingUpper(Vector64<int> lower, Vector128<long> left, Vector128<long> right)
Vector128<sbyte> AddRoundedHighNarrowingUpper(Vector64<sbyte> lower, Vector128<short> left, Vector128<short> right)
Vector128<ushort> AddRoundedHighNarrowingUpper(Vector64<ushort> lower, Vector128<uint> left, Vector128<uint> right)
Vector128<uint> AddRoundedHighNarrowingUpper(Vector64<uint> lower, Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddRoundedHighNarrowingUpperTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector128`1[UInt16],System.Runtime.Intrinsics.Vector128`1[UInt16]):System.Runtime.Intrinsics.Vector128`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
; V02 arg2 [V02,T02] ( 3, 3 ) simd16 -> d2 HFA(simd16)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
raddhn2 v0.16b, v1.8h, v2.8h
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 20, prolog size 8
33. AddSaturate
Vector64<byte> AddSaturate(Vector64<byte> left, Vector64<byte> right)
This method adds the values of corresponding elements of the left
and right
vectors, stores the results in a vector and returns the result vector. If overflow occurs with any of the results, those results are saturated.
private Vector64<byte> AddSaturateTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.AddSaturate(left, right);
}
// left = <155, 200, 200, 1, 5, 16, 17, 18>
// right = <155, 100, 100, 2, 25, 26, 27, 28>
// Result = <255, 255, 255, 3, 30, 42, 44, 46>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> AddSaturate(Vector64<short> left, Vector64<short> right)
Vector64<int> AddSaturate(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> AddSaturate(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<ushort> AddSaturate(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> AddSaturate(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> AddSaturate(Vector128<byte> left, Vector128<byte> right)
Vector128<short> AddSaturate(Vector128<short> left, Vector128<short> right)
Vector128<int> AddSaturate(Vector128<int> left, Vector128<int> right)
Vector128<long> AddSaturate(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> AddSaturate(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> AddSaturate(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AddSaturate(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> AddSaturate(Vector128<ulong> left, Vector128<ulong> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> AddSaturate(Vector64<byte> left, Vector64<sbyte> right)
Vector64<short> AddSaturate(Vector64<short> left, Vector64<ushort> right)
Vector64<int> AddSaturate(Vector64<int> left, Vector64<uint> right)
Vector64<sbyte> AddSaturate(Vector64<sbyte> left, Vector64<byte> right)
Vector64<ushort> AddSaturate(Vector64<ushort> left, Vector64<short> right)
Vector64<uint> AddSaturate(Vector64<uint> left, Vector64<int> right)
Vector128<byte> AddSaturate(Vector128<byte> left, Vector128<sbyte> right)
Vector128<short> AddSaturate(Vector128<short> left, Vector128<ushort> right)
Vector128<int> AddSaturate(Vector128<int> left, Vector128<uint> right)
Vector128<long> AddSaturate(Vector128<long> left, Vector128<ulong> right)
Vector128<sbyte> AddSaturate(Vector128<sbyte> left, Vector128<byte> right)
Vector128<ushort> AddSaturate(Vector128<ushort> left, Vector128<short> right)
Vector128<uint> AddSaturate(Vector128<uint> left, Vector128<int> right)
Vector128<ulong> AddSaturate(Vector128<ulong> left, Vector128<long> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddSaturateTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uqadd v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
34. AddSaturateScalar
Vector64<long> AddSaturateScalar(Vector64<long> left, Vector64<long> right)
This method scalar
variant, adds the values of corresponding elements of the left
and right
vectors, stores the results in a vector and returns the result vector. If overflow occurs with any of the results, those results are saturated.
private Vector64<long> AddSaturateScalarTest(Vector64<long> left, Vector64<long> right)
{
return AdvSimd.AddSaturateScalar(left, right);
}
// left = <11>
// right = <11>
// Result = <22>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<ulong> AddSaturateScalar(Vector64<ulong> left, Vector64<ulong> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<byte> AddSaturateScalar(Vector64<byte> left, Vector64<byte> right)
Vector64<byte> AddSaturateScalar(Vector64<byte> left, Vector64<sbyte> right)
Vector64<short> AddSaturateScalar(Vector64<short> left, Vector64<short> right)
Vector64<short> AddSaturateScalar(Vector64<short> left, Vector64<ushort> right)
Vector64<int> AddSaturateScalar(Vector64<int> left, Vector64<int> right)
Vector64<int> AddSaturateScalar(Vector64<int> left, Vector64<uint> right)
Vector64<long> AddSaturateScalar(Vector64<long> left, Vector64<ulong> right)
Vector64<sbyte> AddSaturateScalar(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<sbyte> AddSaturateScalar(Vector64<sbyte> left, Vector64<byte> right)
Vector64<ushort> AddSaturateScalar(Vector64<ushort> left, Vector64<ushort> right)
Vector64<ushort> AddSaturateScalar(Vector64<ushort> left, Vector64<short> right)
Vector64<uint> AddSaturateScalar(Vector64<uint> left, Vector64<uint> right)
Vector64<uint> AddSaturateScalar(Vector64<uint> left, Vector64<int> right)
Vector64<ulong> AddSaturateScalar(Vector64<ulong> left, Vector64<long> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddSaturateScalarTest(System.Runtime.Intrinsics.Vector64`1[Int64],System.Runtime.Intrinsics.Vector64`1[Int64]):System.Runtime.Intrinsics.Vector64`1[Int64]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
sqadd d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
35. AddScalar
Vector64<double> AddScalar(Vector64<double> left, Vector64<double> right)
This method adds the floating-point values of the two source vectors, and writes the result to the result. This performs scalar operation.
private Vector64<double> AddScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.AddScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <23>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<long> AddScalar(Vector64<long> left, Vector64<long> right)
Vector64<float> AddScalar(Vector64<float> left, Vector64<float> right)
Vector64<ulong> AddScalar(Vector64<ulong> left, Vector64<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fadd d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
36. AddWideningLower
Vector128<ushort> AddWideningLower(Vector64<byte> left, Vector64<byte> right)
This method adds corresponding vector elements in the left
to those of right
vector, stores the result in a vector, and returns the result vector. As seen in below example, the result vector element’s size ushort
is twice that of input parameter element size byte
.
private Vector128<ushort> AddWideningLowerTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.AddWideningLower(left, right);
}
// left = <155, 200, 200, 1, 5, 16, 17, 18>
// right = <155, 100, 100, 2, 25, 26, 27, 28>
// Result = <310, 300, 300, 3, 30, 42, 44, 46>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> AddWideningLower(Vector64<short> left, Vector64<short> right)
Vector128<long> AddWideningLower(Vector64<int> left, Vector64<int> right)
Vector128<short> AddWideningLower(Vector64<sbyte> left, Vector64<sbyte> right)
Vector128<uint> AddWideningLower(Vector64<ushort> left, Vector64<ushort> right)
Vector128<ulong> AddWideningLower(Vector64<uint> left, Vector64<uint> right)
Vector128<short> AddWideningLower(Vector128<short> left, Vector64<sbyte> right)
Vector128<int> AddWideningLower(Vector128<int> left, Vector64<short> right)
Vector128<long> AddWideningLower(Vector128<long> left, Vector64<int> right)
Vector128<ushort> AddWideningLower(Vector128<ushort> left, Vector64<byte> right)
Vector128<uint> AddWideningLower(Vector128<uint> left, Vector64<ushort> right)
Vector128<ulong> AddWideningLower(Vector128<ulong> left, Vector64<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddWideningLowerTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uaddl v16.8h, v0.8b, v1.8b
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
37. AddWideningUpper
Vector128<ushort> AddWideningUpper(Vector128<byte> left, Vector128<byte> right)
This method adds corresponding vector elements in the upper half of left
to those of right
vector, stores the result into a result vector, and returns the result vector. As seen in below example, the result vector element’s size ushort
is twice as long as the input parameter’s element size byte
.
private Vector128<ushort> AddWideningUpperTest(Vector128<byte> left, Vector128<byte> right)
{
return AdvSimd.AddWideningUpper(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26>
// right = <21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36>
// Result = <48, 50, 52, 54, 56, 58, 60, 62>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<int> AddWideningUpper(Vector128<short> left, Vector128<short> right)
Vector128<short> AddWideningUpper(Vector128<short> left, Vector128<sbyte> right)
Vector128<int> AddWideningUpper(Vector128<int> left, Vector128<short> right)
Vector128<long> AddWideningUpper(Vector128<int> left, Vector128<int> right)
Vector128<long> AddWideningUpper(Vector128<long> left, Vector128<int> right)
Vector128<short> AddWideningUpper(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<ushort> AddWideningUpper(Vector128<ushort> left, Vector128<byte> right)
Vector128<uint> AddWideningUpper(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> AddWideningUpper(Vector128<uint> left, Vector128<ushort> right)
Vector128<ulong> AddWideningUpper(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> AddWideningUpper(Vector128<ulong> left, Vector128<uint> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AddWideningUpperTest(System.Runtime.Intrinsics.Vector128`1[Byte],System.Runtime.Intrinsics.Vector128`1[Byte]):System.Runtime.Intrinsics.Vector128`1[UInt16]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd16 -> d0 HFA(simd16)
; V01 arg1 [V01,T01] ( 3, 3 ) simd16 -> d1 HFA(simd16)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
uaddl2 v16.8h, v0.16b, v1.16b
mov v0.16b, v16.16b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
38. And
Vector64<byte> And(Vector64<byte> left, Vector64<byte> right)
This method ands the vector elements in the left
and right
vector, and returns the result vector.
private Vector64<byte> AndTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.And(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <1, 4, 5, 8, 9, 16, 17, 16>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> And(Vector64<double> left, Vector64<double> right)
Vector64<short> And(Vector64<short> left, Vector64<short> right)
Vector64<int> And(Vector64<int> left, Vector64<int> right)
Vector64<long> And(Vector64<long> left, Vector64<long> right)
Vector64<sbyte> And(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> And(Vector64<float> left, Vector64<float> right)
Vector64<ushort> And(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> And(Vector64<uint> left, Vector64<uint> right)
Vector64<ulong> And(Vector64<ulong> left, Vector64<ulong> right)
Vector128<byte> And(Vector128<byte> left, Vector128<byte> right)
Vector128<double> And(Vector128<double> left, Vector128<double> right)
Vector128<short> And(Vector128<short> left, Vector128<short> right)
Vector128<int> And(Vector128<int> left, Vector128<int> right)
Vector128<long> And(Vector128<long> left, Vector128<long> right)
Vector128<sbyte> And(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> And(Vector128<float> left, Vector128<float> right)
Vector128<ushort> And(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> And(Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> And(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:AndTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
and v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
39. BitwiseClear
Vector64<byte> BitwiseClear(Vector64<byte> value, Vector64<byte> mask)
This method performs AND of corresponding vector elements in value
and complement of mask
vector and returns the result vector containing the result of this operation.
private Vector64<byte> BitwiseClearTest(Vector64<byte> value, Vector64<byte> mask)
{
return AdvSimd.BitwiseClear(value, mask);
}
// value = <255, 255, 255, 255, 255, 255, 255, 255>
// mask = <1, 2, 4, 8, 16, 32, 64, 128>
// Result = <254, 253, 251, 247, 239, 223, 191, 127>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> BitwiseClear(Vector64<double> value, Vector64<double> mask)
Vector64<short> BitwiseClear(Vector64<short> value, Vector64<short> mask)
Vector64<int> BitwiseClear(Vector64<int> value, Vector64<int> mask)
Vector64<long> BitwiseClear(Vector64<long> value, Vector64<long> mask)
Vector64<sbyte> BitwiseClear(Vector64<sbyte> value, Vector64<sbyte> mask)
Vector64<float> BitwiseClear(Vector64<float> value, Vector64<float> mask)
Vector64<ushort> BitwiseClear(Vector64<ushort> value, Vector64<ushort> mask)
Vector64<uint> BitwiseClear(Vector64<uint> value, Vector64<uint> mask)
Vector64<ulong> BitwiseClear(Vector64<ulong> value, Vector64<ulong> mask)
Vector128<byte> BitwiseClear(Vector128<byte> value, Vector128<byte> mask)
Vector128<double> BitwiseClear(Vector128<double> value, Vector128<double> mask)
Vector128<short> BitwiseClear(Vector128<short> value, Vector128<short> mask)
Vector128<int> BitwiseClear(Vector128<int> value, Vector128<int> mask)
Vector128<long> BitwiseClear(Vector128<long> value, Vector128<long> mask)
Vector128<sbyte> BitwiseClear(Vector128<sbyte> value, Vector128<sbyte> mask)
Vector128<float> BitwiseClear(Vector128<float> value, Vector128<float> mask)
Vector128<ushort> BitwiseClear(Vector128<ushort> value, Vector128<ushort> mask)
Vector128<uint> BitwiseClear(Vector128<uint> value, Vector128<uint> mask)
Vector128<ulong> BitwiseClear(Vector128<ulong> value, Vector128<ulong> mask)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:BitwiseClearTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
bic v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
40. BitwiseSelect
Vector64<byte> BitwiseSelect(Vector64<byte> select, Vector64<byte> left, Vector64<byte> right)
This method sets each bit in the result to the corresponding bit from the left
vector when the select
vector’s bit was 1, otherwise from the right
vector.
private Vector64<byte> BitwiseSelectTest(Vector64<byte> select, Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.BitwiseSelect(select, left, right);
}
// select = <11, 12, 13, 14, 15, 16, 17, 18>
// left = <21, 22, 23, 24, 25, 26, 27, 28>
// right = <31, 32, 33, 34, 35, 36, 37, 38>
// Result = <21, 36, 37, 40, 41, 52, 53, 52>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<double> BitwiseSelect(Vector64<double> select, Vector64<double> left, Vector64<double> right)
Vector64<short> BitwiseSelect(Vector64<short> select, Vector64<short> left, Vector64<short> right)
Vector64<int> BitwiseSelect(Vector64<int> select, Vector64<int> left, Vector64<int> right)
Vector64<long> BitwiseSelect(Vector64<long> select, Vector64<long> left, Vector64<long> right)
Vector64<sbyte> BitwiseSelect(Vector64<sbyte> select, Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> BitwiseSelect(Vector64<float> select, Vector64<float> left, Vector64<float> right)
Vector64<ushort> BitwiseSelect(Vector64<ushort> select, Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> BitwiseSelect(Vector64<uint> select, Vector64<uint> left, Vector64<uint> right)
Vector64<ulong> BitwiseSelect(Vector64<ulong> select, Vector64<ulong> left, Vector64<ulong> right)
Vector128<byte> BitwiseSelect(Vector128<byte> select, Vector128<byte> left, Vector128<byte> right)
Vector128<double> BitwiseSelect(Vector128<double> select, Vector128<double> left, Vector128<double> right)
Vector128<short> BitwiseSelect(Vector128<short> select, Vector128<short> left, Vector128<short> right)
Vector128<int> BitwiseSelect(Vector128<int> select, Vector128<int> left, Vector128<int> right)
Vector128<long> BitwiseSelect(Vector128<long> select, Vector128<long> left, Vector128<long> right)
Vector128<sbyte> BitwiseSelect(Vector128<sbyte> select, Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> BitwiseSelect(Vector128<float> select, Vector128<float> left, Vector128<float> right)
Vector128<ushort> BitwiseSelect(Vector128<ushort> select, Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> BitwiseSelect(Vector128<uint> select, Vector128<uint> left, Vector128<uint> right)
Vector128<ulong> BitwiseSelect(Vector128<ulong> select, Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:BitwiseSelectTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
; V02 arg2 [V02,T02] ( 3, 3 ) simd8 -> d2 HFA(simd8)
;# V03 OutArgs [V03 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
mov v16.8b, v0.8b
bsl v16.8b, v1.8b, v2.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 28, prolog size 8
41. Ceiling
Vector64<float> Ceiling(Vector64<float> value)
This method rounds each vector element of value
having floating-point values to integral floating-point values of the same size using the Round towards Plus Infinity rounding mode, and returns the result. As per ARM docs, a zero input gives a zero result with the same sign, an infinite input gives an infinite result with the same sign, and a NaN is propagated as for normal arithmetic.
private Vector64<float> CeilingTest(Vector64<float> value)
{
return AdvSimd.Ceiling(value);
}
// value = <11.5, 12.5>
// Result = <12, 13>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector128<float> Ceiling(Vector128<float> value)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> Ceiling(Vector128<double> value)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CeilingTest(System.Runtime.Intrinsics.Vector64`1[Single]):System.Runtime.Intrinsics.Vector64`1[Single]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintp v16.2s, v0.2s
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
42. CeilingScalar
Vector64<double> CeilingScalar(Vector64<double> value)
Same as Ceiling
above but operates at scalar level.
private Vector64<double> CeilingScalarTest(Vector64<double> value)
{
return AdvSimd.CeilingScalar(value);
}
// value = <11.5>
// Result = <12>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<float> CeilingScalar(Vector64<float> value)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CeilingScalarTest(System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
;# V01 OutArgs [V01 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
frintp d16, d0
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
43. CompareEqual
Vector64<byte> CompareEqual(Vector64<byte> left, Vector64<byte> right)
This method compares corresponding vector elements from left
with those in right
, and if the comparison is equal sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and returns the result vector.
private Vector64<byte> CompareEqualTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.CompareEqual(left, right);
}
// left = <11, 12, 13, 14, 15, 16, 17, 18>
// right = <11, 22,13, 14, 25, 26, 27, 28>
// Result = <255, 0, 255, 255, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> CompareEqual(Vector64<short> left, Vector64<short> right)
Vector64<int> CompareEqual(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> CompareEqual(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> CompareEqual(Vector64<float> left, Vector64<float> right)
Vector64<ushort> CompareEqual(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> CompareEqual(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> CompareEqual(Vector128<byte> left, Vector128<byte> right)
Vector128<short> CompareEqual(Vector128<short> left, Vector128<short> right)
Vector128<int> CompareEqual(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> CompareEqual(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> CompareEqual(Vector128<float> left, Vector128<float> right)
Vector128<ushort> CompareEqual(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> CompareEqual(Vector128<uint> left, Vector128<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> CompareEqual(Vector128<double> left, Vector128<double> right)
Vector128<long> CompareEqual(Vector128<long> left, Vector128<long> right)
Vector128<ulong> CompareEqual(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CompareEqualTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
cmeq v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
44. CompareEqualScalar
Vector64<double> CompareEqualScalar(Vector64<double> left, Vector64<double> right)
This method compares corresponding floating-point values from the left
and right
vector, and if the comparison is equal sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<double> CompareEqualScalarTest(Vector64<double> left, Vector64<double> right)
{
return AdvSimd.Arm64.CompareEqualScalar(left, right);
}
// left = <11.5>
// right = <11.5>
// Result = <NaN>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector64<long> CompareEqualScalar(Vector64<long> left, Vector64<long> right)
Vector64<float> CompareEqualScalar(Vector64<float> left, Vector64<float> right)
Vector64<ulong> CompareEqualScalar(Vector64<ulong> left, Vector64<ulong> right)
See Microsoft docs here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CompareEqualScalarTest(System.Runtime.Intrinsics.Vector64`1[Double],System.Runtime.Intrinsics.Vector64`1[Double]):System.Runtime.Intrinsics.Vector64`1[Double]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
fcmeq d16, d0, d1
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8
45. CompareGreaterThan
Vector64<byte> CompareGreaterThan(Vector64<byte> left, Vector64<byte> right)
This method compares corresponding vector elements in the left
and right
vector, and if the left
’s value is greater than the right
’s value sets every bit of the corresponding vector element in the result vector to one, otherwise sets every bit of the corresponding vector element in the result vector to zero and return the result vector.
private Vector64<byte> CompareGreaterThanTest(Vector64<byte> left, Vector64<byte> right)
{
return AdvSimd.CompareGreaterThan(left, right);
}
// left = <31, 12, 33, 34, 15, 16, 17, 18>
// right = <21, 22, 23, 24, 25, 26, 27, 28>
// Result = <255, 0, 255, 255, 0, 0, 0, 0>
Similar APIs that operate on different sizes:
// class System.Runtime.Intrinisics.AdvSimd
Vector64<short> CompareGreaterThan(Vector64<short> left, Vector64<short> right)
Vector64<int> CompareGreaterThan(Vector64<int> left, Vector64<int> right)
Vector64<sbyte> CompareGreaterThan(Vector64<sbyte> left, Vector64<sbyte> right)
Vector64<float> CompareGreaterThan(Vector64<float> left, Vector64<float> right)
Vector64<ushort> CompareGreaterThan(Vector64<ushort> left, Vector64<ushort> right)
Vector64<uint> CompareGreaterThan(Vector64<uint> left, Vector64<uint> right)
Vector128<byte> CompareGreaterThan(Vector128<byte> left, Vector128<byte> right)
Vector128<short> CompareGreaterThan(Vector128<short> left, Vector128<short> right)
Vector128<int> CompareGreaterThan(Vector128<int> left, Vector128<int> right)
Vector128<sbyte> CompareGreaterThan(Vector128<sbyte> left, Vector128<sbyte> right)
Vector128<float> CompareGreaterThan(Vector128<float> left, Vector128<float> right)
Vector128<ushort> CompareGreaterThan(Vector128<ushort> left, Vector128<ushort> right)
Vector128<uint> CompareGreaterThan(Vector128<uint> left, Vector128<uint> right)
// class System.Runtime.Intrinisics.AdvSimd.Arm64
Vector128<double> CompareGreaterThan(Vector128<double> left, Vector128<double> right)
Vector128<long> CompareGreaterThan(Vector128<long> left, Vector128<long> right)
Vector128<ulong> CompareGreaterThan(Vector128<ulong> left, Vector128<ulong> right)
See Microsoft docs here and here, ARM docs here.
Assembly generated:
; Assembly listing for method AdvSimdMethods:CompareGreaterThanTest(System.Runtime.Intrinsics.Vector64`1[Byte],System.Runtime.Intrinsics.Vector64`1[Byte]):System.Runtime.Intrinsics.Vector64`1[Byte]
;
; V00 arg0 [V00,T00] ( 3, 3 ) simd8 -> d0 HFA(simd8)
; V01 arg1 [V01,T01] ( 3, 3 ) simd8 -> d1 HFA(simd8)
;# V02 OutArgs [V02 ] ( 1, 1 ) lclBlk ( 0) [sp+0x00] "OutgoingArgSpace"
; Lcl frame size = 0
stp fp, lr, [sp,#-16]!
mov fp, sp
cmhi v16.8b, v0.8b, v1.8b
mov v0.8b, v16.8b
ldp fp, lr, [sp],#16
ret lr
; Total bytes of code 24, prolog size 8